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We introduce new quantities for exploratory causal inference between bivariate time series. The 
quantities, called penchants and leanings, are computationally straightforward to apply, follow di¬ 
rectly from assumptions of probabilistic causality, do not depend on any assumed models for the 
time series generating process, and do not rely on any embedding procedures; these features may 
provide a clearer interpretation of the results than those from existing time series causality tools. 
The penchant and leaning are computed based on a structured method for computing probabilities. 


I. INTRODUCTION 

Many scientific disciplines rely on observational data 
from systems in which it is difficult or impossible to im¬ 
plement controlled experiments or to control interven¬ 
tions. For example, there is no current technology that 
can control the interaction between the solar wind and 
the magnetic field measured at the surface of Earth, so 
space weather studies rely on data collected without per¬ 
forming controlled experiments. As a result, causal in¬ 
ference with observational data sets from such systems 
is difficult and the need to identify causal relationships 
given the weakness of correlation in doing so has lead to 
the development of several different time series causality 
tools [iM^. 

Causal inference in time series involves finding “driv¬ 
ing” relationships between different time series signals. 
Showing the existence, rather than the exact nature, of 
the driving relationship between the signals is often the 
primary goal. Thus, words like “driving”, “causality”, 
and related terms typically do not have straightforward 
analogs to the same terms used in other fields @-(3, 
e.g. theoretical quantum (e.g., 0) or classical mechanics 
(e.g., [13). 

The development and study of causal inference tech¬ 
niques is often called time series causality. Most tech¬ 
niques fall into four broad categories related to either 
transfer entropy [I|, Granger causality 0, state space 
reconstruction (SSR) [HI, or lagged cross-correlation 
[3 in. These techniques have found application in a 
wide range of fields including neuroscience (e.g .. [14| ). 
economics (e.g., [iE[i3)j climatology (e.g., [l7j). 

In this article, we introduce a time series causality tech¬ 
nique derived directly from the definition of probabilistic 
causality [13 ■ The technique is applied to synthetic and 
empirical bivariate time series data sets with known, or 
intuitive, causal relationships. We discuss the strengths 
and weaknesses of the technique and demonstrate how 
it may be useful for causal inference with empirical data 
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from systems in which it is difficult or impossible to im¬ 
plement controlled experiments or to control interven¬ 
tions. 


II. CAUSAL PENCHANT 

We define the causal penchant Pec € [1,-1] as 

PEc:=PiE\C)-P{E\C). (1) 

The motivation for this expression is in the straightfor- 
ward interpretation of pEC as a causal indicator [l9|; i.e., 
if C causes (or drives) E, then pEC > 0, and if pEC ^ 0, 
then the direction of causal influence is undetermined. If 
effect E is assumed to be measured in one time series 
and the cause C is assumed to be measured in a different 
time series, then the direction of causal influence can be 
determined by comparing various penchants when each 
time series is assigned to be the cause C. 

Eqn.[T]can be rewritten using Bayes’ theorem 

P{E\C) = P{C\E)^^ (2) 

and the definitions of probability complements 

P{C) = 1 - P{C) (3) 

P{C\E) = 1 - P{C\E). (4) 

Using Eqn. |4] with Eqn. [2] gives 

P{C\E) = l-P{E\C)^^ 

Inserting this into Eq. [5] written in terms of C , 

Pl^E\C) = P{C\E)^^ 

yields an alternative form of the second term in Eqn. [T] 

PW<7)=(l-P(£|C)a|)^T^ , 
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This expression gives a penchant that requires only a 
single conditional probability estimate: 


,.c=P(£|C)(l + ^i:^) 


P{E) 

1 - P{C) 


(5) 


The penchant is non-parametric in the sense that there 
is no assumed functional relationship between the time 
series being investigated. This form of the penchant will 
be used along with a structured method for counting and 
notions from probabilistic causality to infer which time 
series in a given pair might be seen as “driving” the other. 
Our motivation for the penchant is the need for a time 
series causality quantity that is easily computed. 

For the calculations in the following sections, the pen¬ 
chant is not defined if P{C) or P{C) are zero (because 
the conditionals in Eqn.[I] would be undefined). Thus, 
the penchant is not defined if P{C) = 0 or if P{C) = 1. 
The former condition corresponds to an inability to de¬ 
termine causal influence between two time series when a 
cause does not appear in one of the series; the latter con¬ 
dition is interpreted as an inability to determine causal 
influence between two time series if one is constant. The 
use of Bayes’ theorem in the derivation of Eqn. [5] implies 
that the penchant is not defined if P{E) or P{E) are 
zero. 

The method given in this work uses no a priori assign¬ 
ment of “cause” or “effect” to a given time series pair 
when using penchants for causal inference. So, opera¬ 
tionally, the constraints on P{C) and P{E) only mean 
that the penchant is undefined between pairs of time se¬ 
ries where one series is constant. 

The penchant definition includes P{E\C), which is the 
probability of an assumed effect occurring given a ab¬ 
sence of the assumed cause. It has been argued that 
causality determination requires an intervention, and the 
absence of an assumed cause is unobservable, which im¬ 
plies the occurrence probability of the assumed effect 
should be conditioned on performing or not performing 
an action rather than on the presence or lack of an as¬ 
sumed cause 0. Causal relations have been described 
as “a relation among events” [l^, again implying the 
absence of an assumed cause cannot be used to iden¬ 
tify causal relationships. These issues have been a part 
of probabilistic definitions of causality at least since the 
1960s [3, and we do not attempt to solve them in this ar¬ 
ticle. We circumvent these philosophical issues by using 
an expression that removes any conditioning on the ab¬ 
sence of an assumed cause and the condition that the pen¬ 
chant is undefined when P(C) = 0, P{C) = 1, P{E) = 0, 
or P{E) = 1. 

Although Eqn. [5] circumvents the issue of P{E\C) be¬ 
ing unobservable, it does not account for confounding. 
The assumption that P{C) can be estimated from a 
scalar time series may be seen as an oversimplification 
of the dynamics. That is, it may be seen as an assump¬ 
tion that the assumed effect is only caused by the as¬ 
sumed cause. In this case, the penchant may be better 
interpreted as an indication of predictability rather than 


causality (similar to arguments made regarding Granger 
causality [ll|). This issue will not be addressed in this 
article; we emphasize, however, that we use terms such 
as cause, effect, causal inference, and related terms to 
specifically refer to the penchant and leaning quantities. 

In this article, we seek to determine if the penchant is a 
useful quantity for the identification of causality relation¬ 
ships between time series. Our goal is identify the useful¬ 
ness of the penchant (and leaning introduced below) for 
exploratory causal inference, i.e., inference intended to 
determine if (and what) causal structure may be present 
in a time series pair but not to prove or confirm such 
structure. There are scenarios in which any time series 
causality tool such as the penchant may incorrectly as¬ 
sign causal structure or may incorrectly not assign causal 
structure [2^. Eurthermore, proof of causal relationships 
is often considered impossible with data alone [j, . 

The goal of this work is to draw as much information as 
possible from the given data to, e.g., guide the design of 
future experiments. 


III. CAUSAL LEANING 

Consider the assignment of X as the cause, C, and Y 
as the effect, E. If > 0, then the probability that X 
drives Y is higher than the probability that it does not, 
which is stated more succinctly as X has a penchant to 
drive Y or X ^ Y. 

It is possible, however, that the penchant could be pos¬ 
itive when X is assumed as the effect and Y is assumed 
as the cause. (An example of this is given in Section 
IIV Cn The leaning addresses this via 


Aec := Pec — Pce (6) 

for which Xec S [—2,2]. A positive leaning implies the 
assumed cause C drives the assumed effect E more than 
the assumed effect drives the assumed cause, a negative 
leaning implies the effect E drives the assumed cause C 
more than the assumed cause drives the assumed effect, 
and a zero leaning yields no causal information. 

The possible outcomes are notated as 

A£;c>0 {C7,A} = {X,Y}=^X-^ Y 

Aec<0 {C',A} = {X,Y}^ Y-^X 
Xec = 0 {C,E} = {X, Y} no conclusion 

with {C,E} = {A,B} meaning A is the assumed cause 
and B as the assumed effect. 

If A_ec >0 with X as the assumed cause and Y as the 
assumed effect, then X has a larger penchant to drive Y 
than Y does to drive X. That is, Xec > 0 implies that 
the difference between the probability that X drives Y 
and the probability that it does not is higher than the 
difference between the probability that Y drives X and 
the probability that it does not. 








3 


The leaning is a function of four probabilities, P{C), 
P{E), P{C\E), and P{E\C). The usefulness of the 
leaning for causal inference will depend on an effective 
method for estimating these probabilities from times se¬ 
ries and a more specific definition of the cause-effect as¬ 
signment within the time series pair. An operational def¬ 
inition of C and E will need to be drawn directly from 
the time series data if the leaning is to be useful for causal 
inference. Such assignments, however, may be difficult to 
develop and may be considered arbitrary without some 
underlying theoretical support. For example, if the cause 
is Xt-i and the effect is yt, then it may be considered un¬ 
reasonable to provide a causal interpretation of the lean¬ 
ing without theoretical support that X may be expected 
to drive Y on the time scale of At = 1. This issue is, 
however, precisely one of the reasons for divorcing the 
causal inference proposed in this work (i.e., exploratory 
causal inference) from traditional ideas of causality, as 
was explained in the second paragraph of the introduc¬ 
tion. Statistical tools are associational, and cannot be 
given formal causal interpretation without the use of as¬ 
sumptions and outside theories (see for an in-depth 
discussion of these ideas). In practice, many different po¬ 
tential cause-effect assignments may be used to calculate 
different leanings, which may then be compared as part 
of the causal analysis of the data. 

In this article, the probabilities required for the leaning 
calculation will be estimated from the data straightfor¬ 
wardly through a counting/binning procedure, and the 
cause-effect assignment may be varied but will always 
use a simple lag structure to avoid unnecessarily com¬ 
plex computations. The process of estimating probabili¬ 
ties from time series data to draw causal inferences can 
be subtle; see the work of Schreiber et al. for a discussion 
of this issue in the context of transfer entropies [2^ . 

IV. MOTIVATING EXAMPLE 

Consider a time series pair {X, Y} with 

X = {xt I t = 0,I,2,...,9} 

= { 0 , 0 , 1 , 0 , 0 , 1 , 0 , 0 , 1,01 

Y = {yt |t = 0,l,2,...,9} 

= { 0 , 0 , 0 , 1 , 0 , 0 , 1 , 0 , 0 , 1 }. 

Because yt = Xt-i, one may conclude that X drives 
Y. However, to show this result using a leaning calcu¬ 
lation requires first a calculation using the cause-effect 
assignment {C,E} = {X, Yj. For consistency with the 
intuitive definition of causality, we require that a cause 
must precede an effect. It follows that a natural assign¬ 
ment may be {C,E} = {xt-i,yt} for 1 < 1 < t < 9. 
This cause-effect assignment will be referred to as the 
^standard assignment. 

The cause-effect assignment is an assignment of a given 
structure or feature of the data in one time series as the 
“cause” and another structure or feature of the data in 


the other time series as the “effect”. For example, in 
the /-standard cause-effect assignment, the cause is the 
lag I time step in one time series and the effect is the 
current time step in the other. The leaning compares 
the symmetric application of these cause-effect defini¬ 
tions to the time series pair. So, for the above example 
of {C,E} = {xt-i,yt}, the first penchant will be calcu¬ 
lated using {C,E} = {xt-i,yt} and the second will be 
calculated using {C,E} = {yt-i,xt}. The second pen¬ 
chant is not the direct interchange of C A from the 
first penchant because such an interchange would violate 
the assumption that a cause must precede an effect. For 
example, if the first penchant in the leaning calculation 
is calculated using {C,E} = {xt-i,yt}, then the second 
penchant is not calculated using {C,E} = {yt,Xt-i} be¬ 
cause the definition of the effect, Xt-i, precedes the defi¬ 
nition of the cause, yt- 

A. Defining penchants 

Given {X, Yj, one possible penchant (i.e., Eqn.[S|) that 
can be defined using the 1-standard assignment is 

, P(xt_i = l) \ 

P{.yt = i) 
l-P{xt-l = l) ’ 

with K, = P {yt = l|xt_i = 1). Another penchant de¬ 
fined using this assignment is Pyt=o,xt-i=o with k = 
P {yt = 0|xt_i = 0). These two penchants are called ob¬ 
served penchants because they correspond to conditions 
that were found in the measurements. 

Two other penchants have n = P {yt = 0|xt_i = 1) 
and K = P {yt = l|xt_i = 0). These penchants are asso¬ 
ciated with unobserved conditions. Based on the values 
for these two penchants, k = 0 => Pytxt-i < 0, which 
is consistent with the claim that the effect, yt = 0 or 1 
is not caused by the postulated cause, Xt-i = 1 or 0, 
respectively. 

B. Computing penchants 

The probabilities in the penchant calculations can be 
estimated from time series using counts, e.g., 

Vi II 1 ^ 3 

P {yt = l|a;t-i = 1) =-= - = 1 , 

nc 3 

where riEC is the number of times yt = 1 and Xt-i = 1 
appears in {X,Y}, and nc is the number of times the 
assumed cause, Xt-i = 1, has appeared in {X, Yj. 

Estimating the other two probabilities in this penchant 
calculation using frequency counts from {X, Yj requires 
accounting for the assumption that the cause must pre¬ 
cede the effect by shifting X and Y into X and Y such 
that, for any given t, Xt precedes yt- 
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For this example, the shifted sequences are 

X = {0,0,1,0,0,1,0,0,1} 

Y = {0,0,1,0,0,1,0,0,1} 

which are both shorter than there counterparts above by 
a single value because the penchants are being calculated 
using the 1-standard cause-effect assignment. It follows 
that xt = xt-i and yt = yt- 
The probabilities are then 


P{yt = l) = 


riE 3 


and 


ui 1 \ 3 

Fix,., = !) = - = - 


(7) 


( 8 ) 


where nc is the number of times Xt = 1, n^; is the number 
of times yt = 1, and L is the (“library”) length of X and 

Y. 


The mean observed leaning that follows from the defini¬ 
tion of the mean observed penchants is 


i^Vt ) = {Pvt ,Xt-l ) {Pxt,yt-i) 
_ 6 
“ 7 ■ 


( 10 ) 

( 11 ) 


The positive leaning implies the probability that Xt-i 
drives yt is higher than the probability that yt-i drives 

Xt; i.e., X Y given the 1-standard cause-effect as¬ 
signment. This result is expected and agrees with the 
intuitive definition of causality in this example. 

The weighted mean observed penchant is defined simi¬ 
larly to the mean observed penchant, but each penchant 
is weighted by the number of times it appears in the data; 

e-g-, 

{Pyt,Xt-l) w = -j- {ny^ — l^Xt-l = lPyt = l,Xt-l = l 


+ny^=0^xt_i=OPyt=0,xt_t=o) 


= 1 


C. Mean observed leaning 

The two observed penchants in this example under the 
assumption that X causes Y (with ^ = 1) are 


Pyt = l,xt-i = l — 1 


(9) 


and 


and 


{Pxt,yt-i)w = ^ {lT'Xt = l,yt-i=0Pxt = l,yt-i=0 
+nxt=0,yt-i = lPxt=0,yt-i = l 
-\-nxt=0,yt-i=OPxt=0,yt-i=o) 

3 


Pyt=0,xt-i=0 


= 1 


The observed penchants when Y is assumed to cause 
X are 


_ 3 

Pxt = l,yt-i=0 — y ) 

_ 3 

Pxt=0,yt-i=i ~ y ) 


and 


3 

Pxt=0,yt-i=0 — 


The mean observed penchant is the algebraic mean of 
the observed penchants. For X causes Y, it is 


{Pyt,xt-i) — 2 {Pyt = i,xt-i = 'l + Pyt=0,xt-i=o) 

= 1 


and for Y causes X is 

{Pxt,yt-i) ~ 2 {Pxt=l,yt-i=0 

PPxt=0,yt-i = l T Pxt=0,yt-i=o) 

_ 1 
“ 7 ■ 


where na,b is the number of times the assumed cause a 
appears with the assumed effect b and L is the library 
length of X (i.e., L = N — I where N is the library length 
of X and I is the lag used in the ^-standard cause-effect 
assignment). 

The weighted mean observed leaning follows naturally 
as 


{^yt,xt-i)i 


{Pyt ,xt-i)w {Pxt,yt-i)w 

60 


For this example, {Xyt^xt-i)w X Y as expected. 

Conceptually, the weighted mean observed penchant 
is preferred to the mean penchant because it accounts 
for the frequency of observed cause-effect pairs within 
the data, which is assumed to be a predictor of causal 
influence. For example, given some pair {A, B}, if it is 
known that at_i causes bt and both bt = 0 \ at-i = 0 
and = 0 I at-i = 1 are observed, then comparison of 
the frequencies of occurrence is used to determine which 
of the two pairs represents the cause-effect relationship. 

For this example, the weighted mean observed leaning 
provides the same causal inference as the mean observed 
leaning. The weighted mean calculation will be used in 
the examples of the following sections. 
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D. Unobserved penchants 


F. Tolerance domains 


The unobserved penchants for I = 1 for X causes Y 
are 


If the example time series contained noise, then a re¬ 
alization of the example time series {X', Y'} could be 


Pyt = l,Xt-i=0 — 1 
Pyt=Q,xt-i = l 1 


and for Y causes X is 


3 

Pxt=l,yt-i=i ~ 


These values can incorporated into the averaging calcu¬ 
lation to yield a mean total penchant; i.e., for X causes 

Y 


{{Pyt,xt-i)) — ^ (pyt = l,a:t_i = l + Pyt=0,Xt-i=0 

Pyt=i,xt-i=o + Pyt=o,xt-i=i) 

= 0 


and for Y causes X 


1 


{{Pxt,yt-i)) ~ ^ {Pxt=l,yt-i = l T Pxt=0,yt-i=0 
Pxt=l,yt-i=0 + Pxt=O,yt-i='0 


= 0 


Thus, the mean total leaning (defined analogous to Eqn. 
[TU]) is {{^yt,xt-i)) = {{Pyt,xt-i)) ~ {{Pxt,yt-i)) — 0- No 
causal inference can be made with a leaning of zero be¬ 
cause it implies {{pyt,xt-i)) = {{Pxt,yt-i))■ Thus X does 
not have a higher penchant to drive Y than Y does to 
drive X, given the cause-effect assignment used in the 
leaning calculation. Such a conclusion would not be use¬ 
ful for causal inference, which implies the mean total 
leaning is not useful for causal inference in this exam¬ 
ple. 


E. Cause-effect assignment independence 

The causal inference above assumed a cause-effect re¬ 
lationship was known to be correct. It can be shown, 
however, that causal inference is independent of the as¬ 
sumed cause-effect relationship. For example, consider 
the cause-effect assignment {C,E} = {yt-i,xt} with 
1 = 1. The mean observed leaning is 

{^xt,yt-i) ~ {Pxt,yt-i ) {Pyt ,xt-i) 

6 

“ ~7 ’ 

which implies X Y, as expected for this example. 

In general, Xab ■= Pab — Pba => —Xab = Pba — 
PAB '■= Xba- Thus, the causal inference is independent 
of which times series is initially assumed to be the cause 
(or effect). 


X' = K |f = 0,l,2,...,9} 

= {0,0,1.1,0,0,1,-0.1,0,0.9,0} 

Y' = {y( |f = 0,l,2,...,9} 

= (0, -0.2,0.1,1.2,0,0.1,0.9, -0.1,0,1}. 

The previous time series pair, {X,Y} had only five 
observed penchants, but {X',Y'} has more due to the 
noise. It can be seen in the time series definitions that 
x[ = Xt E 0.1 := Xt ± 5x and = Xt ± 0.2 := x* ± 
5y. The weighted mean observed leaning for jX', Y'} is 
{\'t,x'^_i)w ~ 0.19. 

If the noise is not restricted to a small set of discrete 
values, then the effects of noise on the leaning calcula¬ 
tions can be addressed by using the tolerances 5x and 5y 
in the probability estimations from the data. For exam¬ 
ple, the penchant calculation in Fqn. IH] relied on estimat¬ 
ing P(yt = l|xt_i = 1) from the data, but if, instead, 
the data is known to be noisy, then the relevant prob¬ 
ability estimate may be P(yt G [1 — (5y, 1 -I- 5y\\xt-i € 
[1 ~ 1 + <5x])- 

If the tolerances, Sx and Sy, are made large enough, 
then the noisy system weighted mean observed lean¬ 
ing, (Ay'±g^^x'^_^±g^}w, can, at least in the simple ex¬ 
amples considered here, be made equal to the noise¬ 
less system weighted mean observed leaning, i.e., 

('^yt ,xt-i)w- 

Tolerance domains, however, can be set too large. If 
the tolerance domain is large enough to encompass ev¬ 
ery point in the time series, then the probability of the 
assumed cause becomes one, which leads to undefined 
penchants. For example, given the symmetric definition 
of the tolerance domain used in this section, Sx = 2 im¬ 
plies P(xt-i = 1 ± bx) = 1, which implies (Xy'^xt-i}w is 
undefined. 

This example was used to motivate the need for an 
understanding of the noise in the measurements, which 
may not always be possible. If little is known about the 
noise, one strategy is to calculate the leanings with sev¬ 
eral different tolerances, increasing the size of the toler¬ 
ance domains to the point where the penchants become 
undefined, and finding the tolerance domains for which 
the leaning changes sign. The sizes of these domains can 
then be compared to suspected noise levels. This strat¬ 
egy, and others, will be considered in more detail in the 
following sections. If the noise level is known, then the 
task becomes much simpler and the tolerances should 
just be set to the known (or estimated) noise levels for 
the individual time series. 

As mentioned in Section Hill the probabilities required 
for the leaning calculations are estimated in this exam¬ 
ple (and all the following ones) through straightforward 
counting of the data. As such, the tolerance domains may 
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be thought of as bin widths for the probability estima¬ 
tions. Work has been done on optimal and data-driven 
bin width selection procedures (see, e.g., [iSIill), usually 
in the context of finding histograms. Tolerance domains, 
however, may be thought of in terms of the causal in¬ 
ference for which the leaning is intended. The tolerance 
domain for the “cause” (or “effect”) is the domain in 
which an analyst considers the data may still reasonably 
be identified as a “cause” (or “effect”). It is not required 
to be symmetric, and the tolerance domain for one time 
series is not required to be equal to the tolerance domain 
for the other (which is seen in the example above and 
most of the examples that follow). 


G. Stationarity dependence 


Both X and Y are stationary in the original exam¬ 
ple time series pair {X,Y}. Suppose 1,000 zeros are 
appended to the end of each of these time series. The 
additional zeros in the times series may intuitively seem 
to make causal inference more difficult. The probabilities 
required for the penchant Py^=i\xt_^=i become 


P{Vt = = !) = - = ! , 

P{yt = l) = 


1009 


and 


P{Xt-l = 1) 


3 

1 ^ 


These probabilities have become much smaller but the 
penchant remains the same. The same is true for 
Pyt=o|xt_i=o- Despite the additional zeros, Y can 
still only take the values 1 or 0. This knowledge 
along with the above penchants implies rLy^=i^xt-i=i + 
nyt=o,xt.i=o = D, which implies {py^,xt-i)w = 1- The 
other three observed penchants, however, do change as a 
result of the appended zeros. Previously, \pxt=i,yt-i=o\ — 
\Pxt=o,yt-i=i\ = \Pxt=o,yt-i=o\ = 3/7, but with the 
appended zeros, \pxt=i,vt-i=Q\ = \Pxt=Q,yt-i=i\ = 
|Pa:t=o,yt_i=o| = 3/1009. The weighted mean observed 
leaning, changes from 60/63 to approxi¬ 

mately 1012/1009 because of the appended zeros. This 
value is higher than the previous value but yields the 
same causal inference. 

Consider another non-stationary times series pair, 
{Xl,Rl}, where the non-stationary response signal is 
Rl = {0, 0,0,1,1,1, 2, 2, 2, 3}. The weighted mean ob¬ 
served leaning calculated under the 1-standard assign¬ 
ment with no tolerance domains still leads to a causal 
inference that agrees with intuition; i.e. {\rt,xt-i)w ~ 

0.11 ^ Xl Rl as expected. This result, however, 
depends on the library length of the data. 

{Xl,Rl} is a specific instance of the following time 
series pair: 


( 12 ) 


where t = 0,1,2,..., L, 


J 0 y t G {t \t mod 3 0} 

f 1 y t G {t \t mod 3 = 0} 


and 


(13) 


rt=xt-i+rt-i (14) 

with ro = 0. The weighted mean observed leaning, under 
the 1-standard assignment with no tolerance domains, 
for {X, Rj depends on L. As L is increased, the leaning 
calculation will eventually lead to causal inferences that 
do not agree with intuition; e.g., L = 20 => {\rt,xt-i)w ~ 

1.8 X 10-3 X R and L = 50 ^ ~ 

-2.5 X 10-3 R X. 

As L is increased, the number of possible observed ef¬ 
fects for a given observed cause increases. Thus, under 
the 1-standard assignment {C, A} = {xt-i,rt}, Xt-i = 1 
precedes three different values, r* = 1, 2, and 3, if L = 10, 
but it precedes fifteen different values if L = 50. The 
leaning calculations are methods for counting (in a spe¬ 
cific way) the number of times (and ways in which) an ob¬ 
served cause-effect pair appears in the data. The causal 
inference becomes more difficult for non-stationary time 
series pairs because repeated cause-effect pairs in the 
data may be more rare than in the stationary examples. 
This effect is very similar to the effect seen when the 
impulse signal was noiseless but the response was noisy. 
Unfortunately, it cannot be remedied with tolerance do¬ 
mains for the non-stationary case. For example, for 
{X, R}, the cardinality of the set {r* | Xt-i = 1} —>■ oo 
as L —>■ oo, and penchants would not be defined given a 
tolerance domain for R of = c». 

These shortcomings of the weighted mean observed 
leaning when applied to non-stationary data, however, 
do not imply that causal inference of non-stationary data 
cannot be done using a different application of the ob¬ 
served penchants. For example, replacing the weighted 
mean calculation in the weighted mean observed leaning 
calculation with a median calculation leads to a median 
observed leaning^ [Art.xt.J ~ 5.3 x 10-3 => X R 

for L = 50 as expected, where [•] is used to denote the 
median. Of course, even though the median leaning cal¬ 
culation agrees with intuition for a library length where 
the mean leaning calculation did not, there is no reason 
to believe the median leaning calculation will not also 
eventually provide counterintuitive causal inferences as 
L is increased. 

A more basic strategy for dealing with non-stationary 
data would be to define the observed penchant using 
a different cause-effect assignment. For example, the 
Z-standard assignment (with I = 1) used above, i.e., 
{C,E} = {xt-i,rt}, might be replaced with an Z-AR 
(autoregressive) assignment with Z = 1 of {C,E} = 
{(xt-i,rt_i),rtj. An observed penchant may be calcu¬ 
lated with an assumed cause of (xt-i = = 0) and 

an assumed effect of = 1. The algorithms to com¬ 
pute the observed penchants from the data become more 


{X,R} = {{crt},{rt}} 
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complicated as the cause-effect assignment becomes more 
complicated, but the basic definition of the penchant pro¬ 
vides a very general conceptual framework for causal in¬ 
ference. 


V. SIMPLE EXAMPLE SYSTEMS 

In this section the weighted mean observed leaning us¬ 
ing the ^-standard cause-effect assignment for various 
will be applied to dynamical systems and empirical data 
sets with known causal relationships. The usefulness of 
the leaning as a tool for causal inference is tested di¬ 
rectly with synthetic and empirical time series data sets 
for which there is an intuitive understanding of the driv¬ 
ing relationships within the system. 


A. Impulse with Noisy Response Linear Example 

Consider the linear example dynamical system of 

{X,Y} = {{xJ,{ya} (15) 

where t = 0,1, 2,..., L, 

f 2 t = l 

xt = I 0 and t mod 5^0} 

[2 y t G {t \ t mod 5 = 0} 

and 


yt = Xt-i + Br]t 

with j/o = 0, i? G M > 0 and rjt ^ Af {0,1). Specif¬ 
ically, consider B G [0,1]. The driving system X is a 
periodic impulse with a signal amplitude above the max¬ 
imum noise level of the response system, and the response 
system Y is a lagged version of the driving signal with 
A/’(0,1) of amplitude B applied at each time step. 

Figure [T] shows how the weighted mean observed lean¬ 
ing using the 1-standard cause-effect assignment. A, 
changes as the noise amplitude B and tolerance Sy are 
increased in increments of 0.01. The synthetic data sets 
X and Y are constructed such that intuitively X drives 
Y. Thus, it is expected that X Y which implies 
A > 0. Figure[T]shows that this expectation is met except 
when 6y > B even for a short library length of L = 10. 
Examples of undefined penchants due to large tolerance 
domains, as discussed in section ITV F[ are seen as 6y is 
increased in the L = 10 example. 

Figure [T] shows using the strategy oi Sy = B always 
leads to causal inferences that agree with intuition for 
L > 10 in this example. However, as discussed in section 
lIVFi knowing B a priori may be unrealistic with empir¬ 
ical data sets. Consider the following three methods for 
estimating Sy from the data: 

1. lagged linear response deviation - Sy is set to the 
mean absolute deviation of yt from Xt-i', i.e., Sy = 
{\yt-xt-i\). 



Figure 1: The unitless leaning of Eqn. [T5]is a function of 
the noise, the tolerance used for terms from Y, and the 
length of the signals and both Sy and B are unitless. See 
the text for an explanation of the missing data for large 
Sy. The subplots are each for different signal lengths, 
(a) L = 10, (b) L = 50, (c) L = 250, and (d) L = 1750. 


B 

A 

method 1 

method 2 method 3 

0.0 

1.0 

1.0 

1.0 

0.1 

0.40 

1.0 

0.48 

0.5 

0.39 

0.79 

0.26 

0.8 

0.30 

0.44 

0.10 


Table I: A using three different estimation methods for 
Sy: (1) lagged linear response deviation, (2) normalized 
standard deviation, and (3) n-bin mean standard 
deviation. 


2. normalized standard deviation - Sy is set to the stan¬ 
dard deviation of {|Y — (Y)]} where (Y) is the 
mean of Y; i.e., Sy = a\y^_i^y^)\. 

3. n-bin mean standard deviation - Sy is set to the 
mean standard deviation of n bins of Y; i.e., Sy = 
{<XBi) where Bi is the ith bin of an n-bin histogram 

of Y. 

Table U shows A for instances of Eqn. [15] with B = 0, 0.1, 
0.5, and 0.8 and L = 100 (and n = 5 in method 3). 

The three different methods yield different values for 
the leaning, but all the methods lead to the same causal 

inference, X Y, which agrees with intuition. These 
methods are meant to be examples of using the data to 
set Sy if B is not known. These methods are not expected 
to be reasonable estimates for Sx and Sy in general. For 
























example, method 1 assumes a linear relationship between 
X and Y that may be unreasonable to assume in general. 
However, Table H] shows different methods for setting 5y 
can lead to the same causal inference. Setting the toler¬ 
ances requires an understanding of the noise in the times 
series data. The leaning is meant to be part of an ex¬ 
ploratory causal analysis of the time series data and can¬ 
not exist independently of other exploratory analysis of 
the data, including analysis of the noise levels. 


B. Cyclic Linear Example 

The calculations in the previous subsection were only 
for the 1-standard assignment (^ = 1) and are expected 
to be useful for causal inference given Eqn. 1151 How¬ 
ever, deciding which Tstandard assignment to use given 
empirical, rather than synthetic, data sets may be more 
difficult. It is expected that several different /-standard 
assignments would be used as part of any exploratory 
causal analysis using leaning. This section contains an 
example that plots the leanings for a set of different /- 
assignments and shows the maximum leaning in the set 
is near the expected value, i.e., near the lag value that 
appears explicitly in the dynamical system used to create 
the synthetic data sets. 

Consider the linear example dynamical system of 

{X,Y} = {K},{j/*}} (16) 

where / = 0,1, 2,..., L, 


Xt = sin(t) 


and 


yt = xt-i+ Br]t 

with yo = 0, B € [0,1] in steps of 0.01 and rjt Af (0,1). 
This example is very similar to the previous one, except 
that the driving system X is sinusoidal. 

Figures [2] and |3] were calculated for an instance of Eqn. 
m with L = 41 generated by sampling one period of X 
with t G {0, /tt, 2/7r, S/tt, ..., 27r} and / = 1/20. Figure 
[5] shows the weighted mean observed leaning using the 
1-standard assignment, A, is always positive given Sy = 
B. So, as was seen in the previous example, the leaning 
implies X Y, which agrees with intuition for this 

example. 

The driving relationship in this example can be diffi¬ 
cult to discern using unmodified CCM techniques [^ . 
It has been argued that lagged cross-correlation tech¬ 
niques are the preferred causal inference tool in most 
situations because of their simplicity [2^. The lagged 
cross-correlation is defined as 



0 


0.2 


0.4 0.6 

B = 5 

y 


0.8 


Figure 2: Weighted mean observed leaning for Eon. [TCI 
and a tolerance for the leaning calculation set to 
Sy = B. A is always positive, which implies X Y. 


X (Y). The cross-correlation is often used for causal 
inference by introducing a difference quantity Q 

^xiy = xiy - Xya: ' (18) 


The sign of Sxxy is used, similar to the leaning approach, 
to determine the causal inference; i.e., Sxxy > 0 implies 
X “causes” Y and Sxiy < 0 implies Y “causes” X Q. 


C/5 

<0 


c 

=J 


« 0 


c 

a 'o 


• X X 


xxxxxxxxxxx 


5 10 15 20 

1 (lag) 


Figure 3: (Color available online.) The unitless, 
normalized leaning. A/ of Eqn. [16] can be plotted for 
different /-standard cause-effect assignments along with 
the cross correlation, y for the same lags, /, to show how 
the two values compare for this simple cyclic example. 


Figure |3| shows how Sxxy compares to the leaning given 
/ G [1, 21] for an instance of Eqn. [16] with B = 0.5. In Fig¬ 
ure m the leaning has been normalized for presentation 
clarity as 


A' = 


A; 

maxjgq 21] 


(19) 


l _ E [{xt fix) {yt-l My)] 

Xxy ? V-*-' / 

(Tx(Ty 

where E]^^] is the expectation value of {zt}, yx{y) is the 
mean of X (Y), and crrc(y) is the standard deviation of 


where A; is the weighted mean observed leaning using 
the /-standard assignment (Ai is plotted in Figured]). 
The maximum leaning given / G [1, 20] is approximately 
0.625, so the normalized leanings shown in FigureOhave 
a scaling factor of approximately 1.6. 
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Both A' and 5xxy lead to the same causal inference, 
i.e. X “drives” Y, for I G [1,19], although only the lean¬ 
ing agrees with intuition for I = 20 and 1 = 21 in this 
example. Thus, both tools agree with intuition for small 
lags in this simple cyclic example. The leaning, however, 
has its maximum values near the smallest lags, which is 
expected given Eqn. 1161 while the cross-correlation differ¬ 
ence has its maximum values at lags that do not explicitly 
appear in Eqn. 


C. RL Circuit Example 


The cross-correlation difference technique is also 
known to be unreliable given nonlinear dynamics Q. 
Leanings of data sets generated from nonlinear dynamics 
will be discussed in Section FV PI Neither of the previous 
examples has been physically motivated, so this section 
discusses exploratory causal inference of synthetic data 
sets generated from the well-known dynamics of a phys¬ 
ical system. 

Consider a series circuit containing a resistor, inductor, 
and time varying voltage source related by 


dt 


V{t) 

L 



( 20 ) 


where I{t) is the current at time t, V{t) = sin(t) is the 
voltage at time t, R is the resistance, and L is the induc¬ 
tance. The time series pair for this example is then 

{V,I} = {{EJ,{/J} (21) 

where V is the set of discrete values of V{t) evaluated 
using t G {0, fir, 2fir, S/tt, ..., Stt} with / = 1/10 and I 
is the set of discrete values found either by solving Eqn. 
1201 numerically or by evaluating the analytical solution 


^ -b ^ sin(t) - ^ cos(t) (22) 


with D = and r = L/R, for the same time set 

used for V. 

Physical intuition is that V drives /, and so we expect 
to find that V I. The weighted mean observed 

leaning using the 1-standard assignment, Ai, can be used 
to test this expectation. Unlike the previous examples, 
however, there is no noise term in the dynamics (such as 
B in Eqn. HD and HH), so setting the tolerance domains, 
e.g., 5j, will not be as straightforward. 

Table ni shows Ai for both the analytical solution and 
a numerical solution to Eqn. [501 using the ode/5 integra¬ 
tion function in MATLAB. The time series V is created 
by defining values at fixed points and using linear in¬ 
terpolation to find the time steps required by the ODE 
solver. Two different physical scenarios are considered in 
which L and R are constant, L = 10 H and R = 5 and 
L = 5 H and i? = 20 O. 

The previously discussed strategy of increasing Sj un¬ 
til the leaning becomes undefined and then reporting the 


leaning calculated using the largest Sj for which it is de¬ 
fined would lead to a causal inference that agrees with 
intuition for this example. Specifically, from Table HIFal 

Si = 10“^ => Ai Ri 0.7 =b V as expected. 

Discussion on setting the tolerance domains has cen¬ 
tered on understanding the noise in the system. This ex¬ 
ample illustrates that the “noise” being considered does 
not need to be a physical noise source in the system (there 
are no explicit noise terms in Eqn. [HI). Eor example, the 
numerical tolerance of the ODE solver was set to 10“^ 
for the results shown in both Table im and for both ex¬ 
amples setting Si = 10“^ would lead to causal inferences 
that agree with intuition. 

Consider, for example, the peak values of V. The 
time steps of these peaks are Tpeak = {^l^t = 1} = 
{6,26,46,66}. The values of I given r = 0.25 that 
immediately follow these peaks are ^ 

(7, 27,47,67}}. The peak values associated to r = 2 can 
also be found. The standard deviation of the first set is 
(Tq ^25 ~ 10“® and the standard deviation of the second 
set is Ri 10“^. Table IIIFal (for ctq^s^) and (b) (for 
(jPeak) setting Si to the appropriate standard de¬ 

viation of the peaks would lead to causal inferences that 
agree with intuition. Rather than physical noise levels, 
the noise levels used to set the tolerance domains for the 
leaning calculations is better thought of as the spread in 
the possible values of an assumed effect that may reason¬ 
ably be considered due to the same assumed cause. 

This example can also illustrate the importance of 
sample frequency and sample length. The leaning cal¬ 
culation requires an assumed cause and effect pair to 
appear in the data enough times to provide a reliable 
estimates of probabilities. Thus, data that is sampled 
for too few periods or too sparsely can lead to counter¬ 
intuitive leanings. Eor example, if there is only a sin¬ 
gle peak in the assumed driving time series because of 
poor sampling, then there can only be a single response 
value, which would be insufficient to reliably provide the 
conditional probabilities in the leaning calculation for 
that assumed cause-effect pair. For Eqn. |H] with the 
analytical solution for I, if Si = 10“^ and r = 0.25, 
then t G (0, /tt, 2/7r, S/tt, ... , 27r} with / = 1/10 leads 
to Ai = —0.045 and t G (0, /tt, 2/7r, S/tt, ..., Stt} with 
/ = 2/3 leads to Ai = —0.167, both of which disagree 
with intuition. 


D. Nonlinear Example 

The examples so far have all had a linear relation¬ 
ship between the driving signal and the response sig¬ 
nal. Of the four broad categories of time series causality 
tools, transfer entropy and SSR methods are 
the two categories that can be applied to nonlinear data 
sets without modification. The conceptual framework 
of Granger causality is not restricted by the linearity of 
the data set @ , but the original formulation by Granger 
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Si 

Al {ode45) Al 

(analytical) 

Si 

Al [ode^S) Al 

(analytical) 

0 

-0.132 

-0.089 

0 

-0.132 

-0.132 

10"® 

-0.132 

0.493 

10"® 

-0.132 

-0.132 

10"® 

-0.108 

0.548 

10"® 

-0.120 

-0.096 

10"'‘ 

0.188 

0.564 

10"'‘ 

0.011 

0.098 

10"® 

0.582 

0.581 

10"® 

0.398 

0.386 

10"® 

0.730 

0.727 

10"® 

0.676 

0.675 

10"^ 

undefined 

undefined 

10"^ 

0.314 

0.315 

( 

a) R = 20 fl, L = 5 H 

(b) R = 5 II, L 

= 10 H 


Table II: The leaning Ai depends on both 5i and the method for computing I in this example. These two cases show 
that the values of <5/ for which the leaning starts to agree with intuition can also depend on the physical system 

parameters (e.g., r). 


must be modified to do so [13 ■ Lagged cross-correlation 
techniques are known to be unreliable if the data sets are 
generated by nonlinear dynamics Q ■ The next examples 
are for nonlinear systems. 

Consider the nonlinear dynamical system of 

{X,Y} = {{a:t},{yJ} (23) 

where t = 0,1, 2,..., L, 


Xt = sin(t) 


and 


yt = Axt-i (1 - Bxt-i) + Cr]t, 

with yo = 0, with A,B,C S [0,1] and rjt ^ A/” (0,1) 
with t G {0, /tt, 2/7r, S/tt, ..., Ott} and / = 1/30 so that 
L = 181. 

Figure |4] shows the weighted mean observed leaning 
using the 1-standard assignment, i.e., Ai, agrees with in¬ 
tuition over the considered domains of A, B, and C if 
the tolerance domain for Y is set to the noise level, i.e., 
6y = C. The result of X iSlllly Y shows that causal in¬ 
ference using leanings on data sets generated from non¬ 
linear dynamics can be performed similarly, and can lead 
to similarly intuitive results, as the data sets generated 
from linear dynamics. 



Figure 4: Leaning, Ai, of Eon. E51 computed 6y = C as 
function of all three unitless parameters in Eqn. A, 
S, and C. The leaning agrees with intuition in this 
example for all the tested parameter values. The 
subplots are each for different values of C, (a) C = 0.2, 
(b) C = 0.4, (c) C = 0.6, and (d) C = 0.8. 


E. Coupled Logistic Map Example 

Proponents of SSR time series causality tools have 
pointed out the limitations of tools like lagged cross¬ 
correlation and Granger causality when the dynamics ex¬ 
hibit chaotic behavior EH- A chaotic system is consid¬ 
ered in this section. 

Consider the nonlinear dynamical system of 

{X,Y} = {{a:t},{yJ} (24) 

where t = 0,1, 2,..., L, 

Xt = Xt-l {Tx - TxXt-l - jixyVt-l) 


and 

TJt = yt-l (Xy - Tyyt-l - PyxXt-l) 

where the parameters Xx^Xy, Pxy^ Pyx G R > 0. This pair 
of equations is a specific form of the two-dimensional cou¬ 
pled logistic map system often used to model population 
dynamics [^ and it was a system used in in the intro¬ 
duction of cross convergent mapping (CCM) which is a 
SSR time series causality tool [ll|. 

Sugihara et al. [ll| note that Pxy > Pyx intuitively 
implies Y “drives” X more than X “drives” Y, and vice 
versa. Such intuition, however, can be difficult to justify 
for all instances of Eon. [Ml The Xt-i term that appears 
in yt can be seen as a function of Xt -2 with coefficients 
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of fiyxTx- These product coefficients suggest that if > 
ry, then X may be seen as the stronger driver in the 
system even if j3yx < j3xy The same argument can be 
made, with the appropriate substitutions, to show that 
Y may be seen as the stronger driver in the system even 
if Pxy < Pyx- As such, there is no clear intuitive causal 
inference for this system. The conjectures presented in 
this paragraph, however, are supported by the leaning 
calculations (using the 1-standard assignment). 



Figure 5: Leaning, Ai, of Eqn. [51] is a function of four 
unitless parameters in Eqn. [511 Vy, Pxy, and Pyx 
(along with the initial conditions Xq and j/Oi which are 
fixed in this example). The tolerance domains are set as 
5x = o'xt-{x-t) and Sy = The leaning is defined 

using the 1-standard assignment, so 

Ai > 0 => X Y. The subplots are each for 
different parameter values, (a) Vx = Xy = 3.5, (b) 

Tx = 4.0, Ty = 2.0, (c) Tx = 2.0, Ty = 4.0, and (d) 

Tx = 3.8, Ty = 3.2. 


Figure [5] shows four instances of Eqn. [24] with different 
values for Xx and Xy . Each instance has a library length of 
L = 500 and initial conditions of xq = 0.4 and yo = 0.4. 
There is no clear, intuitive driver in this example, so 
both tolerance domains must be set in the leaning cal¬ 
culation. The leaning is calculated using the 1-standard 
cause-effect assignment and estimated tolerance domains 
of 5x = Oxt-{xt) and 8y = ay^-(y^Y 

Figure EKa) shows the intuition of Pxy < Pyx => 
X Y can be true if Xx = Xy. However, Fig¬ 
ure EKb) and (c) shows Xx > Xy ^ 'X. Y and 

Xx < Xy ^ Y X can be strong enough implica¬ 

tions to make the values of Pxy and Pyx irrelevant over 
the considered domains. FigurejSKd) shows this effect can 
be pronounced even in instances of Eon. [5^ where Xx and 
Xy are close. 


Sugihara et al. [HI also discuss how a naive ap¬ 
plication of Granger causality to the system described 
in Eqn. [Mj may lead to conclusions that do not agree 
with intuition, while CCM does. The causal inference 
suggested by the leaning calculations of this subsection 
implies both CCM and leanings may be useful time se¬ 
ries causality tools in situations where Granger causality 
is not. It has also been shown that CCM may fail to 
agree with intuition in example systems for which it has 
already been shown that leaning calculation do, e.g., Sec- 
tion|VC][2l. 

The complexity of determining causal relationships in 
this system may make the system less of a convincing 
example of the leaning calculation than the previous ex¬ 
amples. However, Figure |5| shows the weighted mean ob¬ 
served leaning using the 1-standard cause-effect assign¬ 
ment can provide causal inferences that may be consid¬ 
ered intuitively justifiable, even if the system does not 
have an unequivocal driver. 


F. Impulse with Multiple Noisy Responses 
Example 


All of the previous examples have ignored possible 
causal confounders. The presence of confounders in the 
system is a serious problem for causal inference in gen¬ 
eral im. Time series causality usually seeks to an¬ 
swer the less general causal inference question of “Given 
two times series, which may be considered the stronger 
driver?” Nevertheless, some bivariate time series causal¬ 
ity tools consider causal inference in systems with poten¬ 
tial confouders by trying to relate the estimated bivariate 
driving relationships within a collection of more than two 
time series data sets (e.g., see CCM [HI). The next ex¬ 
ample explores the use of leaning calculations in such a 
scenario. 

Consider the multivariate system of 

fL = {X,Y,Z} = {{xt},{yt},{zt}} (25) 

where t = 0,1, 2,..., L, 

{ 2 t = l 

0 and t mod 5 0} 

2 V t G {t I t mod 5 = 0} 

and 


yt = xt-i -I- Brjt , 

and either (case 1) 


zt = yt-i 


or (case 2) 

z[ = yt-i +yt = yt-i + Xt-i + Brjt 

or (case 3) 

z'l = 2/t-i + Xt-i + Zt-i 


(26) 

(27) 


(28) 
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with j/o = 0, B G M > 0, 77t ~ A/" (0,1), and L = 500. 

In case 1, Z depends directly on Y and indirectly on X 
(through Y, which depends directly on X). The intuitive 

causal inference is then Y Z and X Z. Case 2, 
despite the additional Y dependence in Z, has the same 
intuitive causal inference as case 0. In case 2, Z depends 
directly on itself and both Y and X. Case 3 also has the 
same intuitive causal inference. 


• XYx YZO XZ 

1 



Figure 6: The (unitless) weighted mean observed 
leaning using the 1-standard cause-effect assignment, Ai 
only leads to causal inferences that agree with intuition 
for all points within the considered noise level domain, 
B, in this example for case 2. B \s the unitless noise 
parameter found in Eqn. 1251 and the symmetric 
tolerance domains for Y and Z are set to this value. All 
of the leanings are expected to be positive in each case. 
The subplots are each for different the three different 
cases, (a) Z = {zt} (case 1), (b) Z = {z^} (case 2), and 
(c) Z = {z"} (case 3). 

Figure [5] shows the weighted mean observed leaning us¬ 
ing the 1-standard cause-effect assignment (with Sx = 0 
and Sy = Sz = B), Ai, may lead to causal inferences that 
do not agree with intuition for case 1 and case 3, even 
though case 2 agrees with intuition for all points within 


case 1 case 2 case 3 
A7^ 0.150 0.159 0.169 
A^^^ -0.002 0.133 0.447 
\ar 0.691 0.030 0.735 

Table III: The leaning calculation depends strongly on 
the cause-effect assignment. The table shows the 
weighted mean observed leaning using the 1-AR 
assignment and may be compared with Figure IHl which 
showed this leaning calculation using the 1-standard 
assignment. 


the considered noise levels domain. For case 2, the lean¬ 
ing calculation implies X Y and Y z as ex¬ 

pected, but it also seem to imply that no causal inference 
can be made about the relationship between X and Z. 
For case 3, the leaning calculation also implies no causal 
inference can be made about the relationship between X 
and Z, but, unlike case 1, it also implies Z Y, which 
is counter-intuitive. 

These results may imply that Ai is unable to identify 
confounded driving (i.e., situations in which the effect of 
the driving variable is mediated by another variable). For 
example, in case I, the driving of Z by X occurs through 
the interaction of Y and Z. For case 1, Ai implies X 
Y z but does not imply X z. For case 3, Ai 
implies X iffffly Y Z, which may imply that Ai 

is not a reliable causal inference tool in autoregressive 
systems. 

The results of FigurelHlmay also be considered an indi¬ 
cation that the cause-effect assignment is insufficient. It 
was previously mentioned that exploratory causal analy¬ 
sis using the leaning would involve comparing several dif¬ 
ferent cause-effect assignments. The set of tested cause- 
effect assignments need not only include ^-standard as¬ 
signments. Consider the weighted mean observed lean¬ 
ing, using the 1-AR cause-effect assignment, i.e., 
{C,E} = {xt-i and yt-i,yt}- Table Hill shows this lean¬ 
ing calculation, using Sy = 6z = B = 0.6, for the same 
bivariate relationships shown in Figure IHl 

Table IIIII implies X Y z for case 2 and 

3, as expected, but not for case 1 (which Ai did imply). 
The leaning calculations are part of an exploratory causal 
analysis and must be considered using several different 
cause-effect assignments when trying to understand the 
potential causal structure of a set of times series data. 
The cause-effect assignments can also be expanded be¬ 
yond the bivariate and autoregressive definitions, e.g., 
{C,E} = {xt-i and yt-i and Zt_i,?/t}, but such exten¬ 
sions will not be considered in this article. 
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VI. EMPIRICAL DATA 

Empirical data sets with known (or assumed) causal re¬ 
lationships may be used to understand how exploratory 
causal inference using leanings might be done if the sys¬ 
tem dynamics are unknown (or sufficiently complicated 
to make first principle numerical comparisons cumber¬ 
some). The examples shown in this section are intended 
to demonstrate that causal inference using leaning cal¬ 
culations can agree with the causal “truth” in empirical 
data sets. The analysis shown here is not expected to 
illustrate how the leaning may be used for exploratory 
causal analysis of empirical data for which there is no 
causal “truth”. Such analysis is expected to be more 
complicated than that which is shown here (e.g., involv¬ 
ing multiple tolerance domain calculations and the com¬ 
parison of different cause-effect assignments). 

Figure [3 shows a time series pair with causal “truth” 
from the UCI Machine Learning Repository (MLR) [30l| . 
This data repository is a collection of data sets (some of 
which are time series) with known, intuitive, or assumed 
causal relationships meant for use in the testing of causal 
discovery algorithms in machine learning [30l| . 


40 
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Figure 7: This times series data pair is expected to have 
the causal relationship of A —>■ F, where X or Y is 
marked in parenthesis for each time series. Subplot (a) 
is the mean temperature (X) and (b) is the snow fall 

(Y). 

Figure |7[a) and (b) are times series of the daily snow- 


0.06 . 

0.04 • ^ 

0.02 
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Figure 8: The weighted mean observed leaning, 
using the 1-standard cause-effect assignment for 
I G [0, 21]. The time series pair ti = (X, Y) with X and 
Y shown in Figure |3(a) and (b), respectively. The 

expected causal inference is X Y; i.e., the 
expectation is Xf^ > 0 for every point in the plotted 
domain. 


fall (the expected response) and mean temperature (the 
expected driver) from July 1 1972 to December 31 2009 
at Whistler, BC, Canada (Latitude: 50°04'04.000" N, 
Longitude: 122°56'50.000" W, Elevation: 1835.00 me¬ 
ters). From [ 3 ^, “Common sense tells us that X [mean 
temperature] causes Y [snow fall] (with maybe very small 
feedback of Y on X). Confounders are present (e.g., day 
of the year).” These time series correspond to data set 
87 of the MLR [13]. 

As noted previously, the primary difficulty in using the 
leaning for exploratory causal analysis is the determi¬ 
nation of the cause-effect assignment and tolerance do¬ 
mains. The above data is meant only to illustrate the use 
of leanings, so while a thorough analysis of the noise in 
the system should precede the leaning calculations, such 
a step is avoided here for brevity. 

The symmetric tolerance domains are estimated using 
the maximum standard deviations of the n sets of binned 
points of an n-bin histogram of the normalized time series 
data X' and Y', where X' = y' = and 

n = [O.ILJ (i.e., n is the closest integer that is not larger 
than 10% of the library length L). This estimation is 
similar to the n-bin mean standard deviation technique 
discussed in Sec. IV Al 

The cause-effect assignment will be set naively be¬ 
cause, again, the purpose of this article in not to study 
these particular time series in detail. To reiterate the 
previous comment regarding tolerance domains, detailed 
study would be required to have confidence in using lean¬ 
ings for exploratory causal analysis. However, the conve¬ 
nience of having causal “truths” is that we can take the 
naive approach of simply testing many different cause- 
effect assignments and compare the results to the ex¬ 
pected causal inference. 

Figure [8] shows the weighted mean observed leaning, 
using the 1-standard cause-effect assignment with 
I G [0, 21] and using 5x and 5y estimated in the manner 
described above. 
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Figure [8] shows the leanings, using the given toler¬ 
ance domains, imply causal inferences that agree with 
the causal truths for the tested pair with 1-standard as¬ 
signments. « 

This example also highlights the problem of determin- § 
ing which Z-standard assignment to use for the causal in¬ 
ference. If it is decided that the causal inference depends 
on 

= >^7 ( 29 ) 

where 

|Af,^| = max|Af*^| , (30) 

then = 0.040 X Y, which agrees with the 
causal truth. 

The NASA OMNI data set consists of hourly-averaged 
time series measurements of several different space 
weather parameters from 1963 to present, collected from 
more than twenty different satellites, along with sunspot 
number and several different geomagnetic indices, includ¬ 
ing Dgt, collected from the NOAA National Geophysical 
Data Center [3l|. The disturbance storm time, Dgt, is a 
measure of geomagnetic activity • The magnetic field 
measurements in the OMNI data sets, specifically in 
GSE coordinates [S^, is believed to be a driver of Dgt 

M- 

Let pL = I t' G [t',L]} be 

an ordered subset of the available time series data 
{{Bzit)}:{Dstit)} I t = 0,1, 2,... ,iV} where N is the 
number of hourly data points in the OMNI data set. If 
Af is the weighted mean observed leaning for using 
the 1-standard cause-effect assignment, then n samples 
of P^, each with a different fg, would produce a set of n 
leanings, {Af}, from which the causal inference could be 
drawn. 

Let L = 500 and n = 10"^. The symmetric tolerance do¬ 
mains are naively set with and /cr|D„t-{n,t)| 

for each sampled times series of length L with f=0.05. 

The starting points for each time series are sampled from 
a uniform distribution over [0, N — L], Figure|n{a) shows 
the causal inference drawn from each set of leanings 

agrees with intuition, i.e., B^ D^t, if the causal 

inference is based on, e.g., the mean value from the 
set of leanings (Af) with 1 = 1. The algebraic means 
(Af) found using different /-standard assignments with 
/ G [1, 20] are shown in Figure [9}b). The causal inference 

is Bz Dst for every I in the plotted domain. 

This example is the first for which a set of leanings 
has been used for causal inference, which may imply sta¬ 
tistical testing should be used. The sample mean was 
used for causal inference and happened to agree with in¬ 
tuition for this example, but would the same conclusion 
be drawn using a formal hypothesis test? How should the 
null hypothesis and test statistic be constructed? Such 
questions can be subtle (see, e.g., [Hi). The sampling 
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Figure 9: (Color available online) (a) A histogram of 
the {B'^,Dst) set of n = 10^ (unitless) weighted mean 
observed leanings, using the 1-standard cause-effect 
assignment, i.e.. A}, show causal inferences that agree 
with intuition for the OMNI data set. The red dashed 
lines show the algebraic mean of the sets. The 
geomagnetic field component is calculated by taking 
the Bz times series in the OMNI data set and then 
setting B'^ = 0 ii Bz > 0 [11,13^. (b) The algebraic 
means of the aforementioned sets of weighted mean 
observed leanings, i.e., (Af), are positive for all leanings 
calculated using /-standard assignment given / G [1,20]. 


procedure used to produce FigurelHDa) produces 2,796 de¬ 
fined leanings, 95% of which are below 4.2 x 10“^ and 5% 
of which are below —4.0 x 10“^. A 90% confidence that 
the leaning falls in the interval [—4.0 x 10“^, 4.2 x 10“^], 
however, is not a strong indication that the data sup¬ 
ports the intuitive causal structure. The mean of the set 
is fj, = 3.5 X 10“^, and the variance is tr^ = 7.0 x 10“®. 
If it is assumed that the leaning in this example is dis¬ 
tributed as A/’(/i, cr^), then a 95% confidence interval may 
be [fi — 2a, fi + 2a\ = [—4.9 x 10“^, 5.6 x 10“^], which, 
again, does not strongly support the intuitive causal in¬ 
ference for this example. Approximately 40% of the lean¬ 
ings in this example are negative, which may imply that 
there is only a 60% confidence that this data supports the 
intuitive causal inference, given the tolerance domains 
and cause-effect assignments. 

Suppose a null hypothesis is defined as (Af) = 0. The 
standard error is SE = ajy/n = 5.0 x 10“^, from which 
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the t-test statistic follows [s^ as t = /i/SE = 7.08. A 
two-tailed t-test (i.e., calculating the p-value with an al¬ 
ternative hypothesis of (Af) 7 ^ 0) returns a p-value of ap¬ 
proximately zero [ 3 ^ . which implies the null hypothesis 
should be rejected in favor of the alternative at any sig¬ 
nificance level. A right-tailed t-test (i.e., the alternative 
hypothesis is (Af) > 0) also returns a p-value of approxi¬ 
mately zero. A left-tailed t-test (i.e., the alternative hy¬ 
pothesis is (Af) < 0) returns a p-value of approximately 
one, which implies the null hypothesis cannot be rejected 
in favor of the alternative at any significance level. These 
hypothesis tests seem to imply the population mean of 
the sampled leanings calculated in this example is likely 
not zero (which implies the time series pair has some 
causal structure) and is likely greater than zero (which 
implies the causal inference made with the leaning agrees 
with intuition). These conclusions, however, depend on 
whether or not the t-test is applicable to this example. 
For example, the assumption that the sample mean of 
the leanings can be assumed to follow a normal distri¬ 
bution based on the central limit theorem may rely 
on the sampled time series from which the leaning were 
calculated being independent and identically distributed, 
which may not be true. The assumptions used in these 
statistical tests are intended to be illustrative. Such as¬ 
sumptions should be explored in depth to formally de¬ 
velop a statistical test for causal inference using leanings. 
The sampling procedure used in this example may not be 
applicable to other data sets for which the leaning may 
still be a useful causal inference tool. Thus, it may not be 
possible that a single statistical test will be appropriate 
for all sets of leaning calculations. 

A bootstrapping procedure can be set up with the 
sample of leaning calculations, whereby 10 ® means are 
calculated from new sets (of the same size as the original 
set) of leanings that have been sampled (with replace¬ 
ment) from the original set. This procedure yields no 
negative means; the null hypothesis that the mean lean¬ 
ing value is actually negative (i.e., (Af) < 0) can be re¬ 
jected with ap-value less than 10“®. The 90% confidence 
interval for the mean of the 10 ® bootstrapped means is 
[3.48 X 10“®, 3.57 x 10“®], which, again, implies the mean 
leaning for this example is positive. 


VII. SPURIOUS LEANINGS 


with rjt ^ A/'(0,1). The first time series, X, is the peri¬ 
odic impulse that drove the example system in Eqn. 1151 
The second time series, Y, is standard Gaussian noise 
applied at each time step. 

There is no intuitive causal relationship in Eqn. 1311 
However, Figure [TU] shows the weighted mean observed 
leaning using the 1 -standard assignment may lead to spu¬ 
rious causal inferences for different symmetric tolerance 
domains 6y, given i5a; = 0. The causal inference becomes 
inconclusive as the library length L is increased; i.e., the 
leaning moves towards zero for the tested tolerance do¬ 
mains as the library length of Eqn. [21] is increased. How¬ 
ever, the use of leanings for causal inference with Eqn. 
ED at smaller library length, e.g., L = 10, may imply a 
spurious relationship. 
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Figure 10: Eqn. jST] leads to spurious leanings, i.e. 
weighted mean observed leanings using the 1 -standard 
assignment, Ai, that depend on both the tolerance 
domain (Ty and the library length L. 


The spurious leanings shown in this example may im¬ 
ply causal relationships that do not exist in the system. 
Leaning calculations may be part of an exploratory causal 
inference analysis, but care must to be taken to ensure a 
causal relationship is actually present in the data, even 
if the directionality (or other features) of that relation¬ 
ship are unknown. The relationship between leanings and 
causality as it is typically understood in physics (i.e., 
involving interventions into the systems under investi¬ 
gation, e.g., through experiments i) is not currently 
known. This article is exploring the use of leanings as 
part of an exploratory causal analysis in times series data, 
not as a definition or proof of causality in a dynamical 
system. 


Consider the linear system of 

{X,Y} = {{xJ,{j/J} (31) 

where t = 0,1,2,L, 

{ 2 t = l 

0 y t G {t \ t ^ 1 and t mod 5 7 ^ 0} 

2 \/ t G {t \t mod 5 = 0} 


VIII. CONCLUSION 

Causal inference using observational data alone is a dif¬ 
ficult task 0. This problem is important in many fields, 
but in physics in particular, there are often subfields for 
which direct experimentation is not technologically fea¬ 
sible. 

Exploratory causal analysis, as it has been de¬ 
scribed here, involves many different techniques, includ¬ 
ing Granger causality (GC), transfer entropy (TE), cross 
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correlation (CC), and state space reconstruction (SSR). 
Each of these techniques has well-known shortcomings. 
For example, GC is parametric, TE can be computation¬ 
ally difficult [ 2 ^, CC can be unreliable Q, and SSR relies 
on correctly setting lag times and embedding dimensions 
[13 • Causal leaning has been introduced to overcome 
many of these shortcomings: it is non-parametric, may 
be based on counting, and the only adjustable parameters 
are the tolerance domains and cause-effect assignments. 
It can be shown that a Granger causality statistic and 
transfer entropy imply causal inferences that agree with 
intuition for examples similar to those shown in Section 
IV A[ IV Cl IV D1 and IV El Thus, the leaning is not unique 
in its ability to provide intuitive causal inferences for 
these simple examples, despite a different operational def¬ 
inition of causality than either of these tools. It can also 
be shown that a Granger causality statistic can provide 
the counter-intuitive causal inference for the snowfall ex¬ 
ample shown in Section EH for which the leaning pro¬ 
vides the intuitive causal inference. Understanding the 
differences and similarities between different time series 
causality measures (with different operational definitions 
of causality) may help provide a deeper understanding of 
how time series causality relates (in general) to funda¬ 
mental or intuitive notions of causality. These ideas will 
be explored in future work. 

No attempt has been made to interpret causal leanings 
in terms of current philosophical causality studies. For 
example, there is no exploration of how causal leanings 
are associated with token or prima facie causality @ . We 
have grouped the leaning method under the broad term 
of time series causality inference, which implies the tech¬ 
nique is distinct from other data causality methods, in¬ 
cluding direct acyclic graph (DAG) 0 and temporal logic 
0 techniques. Causal leanings have been introduced here 


as a practical tool and connections with the broader fields 
of data causality and causality foundations are left for fu¬ 
ture work. For example, leanings may be a subset of the 
more general temporal logic presented by Kleinberg Q 
and may have interpretations within Good’s probabilistic 
causal framework of propensities and weights of evidence 
0 . 

There are many open questions regarding the use of 
leanings for causal inference that have not been explored 
in this article. For example, how should the magnitude of 
the leaning calculations be interpreted?; if there are two 
weighted mean observed leanings Ai and A 2 with different 
cause-effect assignments such that Ai > A 2 - does the 
first cause-effect assignment represent a “stronger” driver 
than the second?; how should leanings of 0, 2, or —2 be 
interpreted with respect to the cause-effect assignments?; 
and how should leanings calculated using different cause- 
effect assignments be compared? 

There has also been no formal exploration of using 
leanings as part of statistical tests, as is often done with 
GC [0. The use of histograms in Figure IHl may be con¬ 
sidered a first step toward statistical interpretations of 
leanings. 

Finally, this article has discussed the use of leanings as 
part of an exploratory causal analysis of time series data. 
Exactly how such an analysis should be conducted is, 
however, still an open question. For example, given a re¬ 
port of GC, TE, SSR, and leanings (all calculated in var¬ 
ious ways), how should the results be interpreted holis¬ 
tically? There are many potentially confusing scenarios 
in which, e.g., two techniques lead to opposite causal in¬ 
ferences. The most reasonable time series causality tech¬ 
niques to use for a given exploratory causal analysis may 
depend strongly on the data itself, but general guidelines 
for such analysis is, as far as we know, unknown. 
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